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Abstract 

It is common practice in Markov chain Monte Carlo to update a high-dimensional 
chain one variable (or sub-block of variables) at a time, rather than conduct a single block 
update. While this modification can make the choice of proposal easier, the theoretical 
convergence properties of the associated Markov chain have received limited attention. 
We present conditions under which the chain converges uniformly to its stationary distri- 
bution at a geometric rate. Also, we develop a recipe for performing regenerative simu- 
lation in this setting and demonstrate its application for estimating Markov chain Monte 
Carlo standard errors. In both our investigation of convergence rates and in Monte Carlo 
standard error estimation we pay particular attention to the case with state-independent 
component-wise proposals. We illustrate our results in two examples, a toy Bayesian infer- 
ence problem and a practically relevant example involving maximum likelihood estimation 
for a generalized linear mixed model. 
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1 Introduction 



Let vr be a probability distribution having support X. The canonical Markov chain Monte 
Carlo (MCMC) method for making draws from vr is the Metropolis-Hastings algorithm, de- 
scribed here. Let X^^^ = x denote the current state, and suppose the stationary distribution 
vr has a density with respect to some reference measure fi (often Lebesgue or counting measure 
or their product), which we will also denote by vr. Let p{-, •) denote the user-defined proposal 
kernel density. The updated state X^'^^^^ is obtained via 

1. Simulate x* from proposal density p{x, •) 

2. Calculate acceptance probability a{x,x*), where 

7r{y)p{y,x 



3. Set 



a{x,y) = mm\ 1, 

TT{x)p{x,y) 



X* with probability a{x,x*) 
x with probability 1 — a{x,x* 



Thus the choice of a Metropolis-Hastings sampler boils down to choosing a proposal density 
kernel common choice is to use a proposal kernel that satisfies p{x,y) = p{y,x) 

in which case this is called a Metropolis algorithm. If, further, p{x, y) = p[x — y) = p{y — x) 
for all X and y, the sampler is called a Metropolis random walk. Another common choice of 
candidate is a proposal p{-) that does not depend on the current state of the chain, that is, 
the Metropolis-Hastings independence sampler (MHIS). 

In practice, the selection of a candidate distribution can be a challenging proposition, 
particularly in problems where the state space has high dimension. This has led to investi- 
gation of optimal scaling of Metropolis algorithms and so-called adaptive algorithms which 
allow the proposal ke r nel to change oyer the course of the simulation (see, for example. 



Bedard and Rosenthall . 120081 : iRosenthall . |2008| . and the references therein). An alternative 



approach is to, rather than update the chain as a single block, update one variable (or sub- 
block of variables) at a time . The choice between these two strategies is frequently unclear 



(see, e.g., [Roberts and Sahul . 119971 ) . although a general guideline seems to be that updating 
as a single block may not be advantageous if the components of vr are only weakly correlated. 
Further, by breaking a high-dimensional simulation problem into several smaller-dimensional 
problems, the component- wise approach can make the choice of candidate distributions much 
easier. 
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One hopes that whatever version of MetropoUs-Hastings is used, the simulation will rel- 
atively quickly produce a representative sample from the target population vr. Thus an im- 
portant consideration in MCMC is the rate of convergence of the chain to its stationary 
distribution. Specificity requires notation. Let B be the Borel cr-algebra on X and P'^{x,dy) 
denote the n-step Markov transition kernel, that is, for any x £ X, A ^ B, and n G Z+, 
P''{x,A) = Pr(X("+-') G = x) for the Markov chain ^> = {X(^\ X^^\ X(^\ . . .}. If the 

chain is Harris ergodic (i.e., aperiodic, vr-irreducible, and positive Harris recurrent), then, as 
n — > oo, •) — '7r(-)|| where || • || denotes the total variation norm. We will consider 

the rate of this convergence in the following way. Suppose there exist a real-valued function 
M{x) on X and < t < 1 such that 



-vr(-)|| < M(x)r . 



il) 



When M is bounded say $ is uniformly ergodic and otherwise say it is geometrically ergodic. 



Harris ergodicity is well-understood for Metropolis-Hastings chains. iTiernevI (|1994| ) showed 
that a Met ropolis-Hastings c hain with block updates is Harris ergod ic under weak condi- 



tions while 



Chan and Geved (|l994l ) and [Roberts and Rosenthall (j2006l ) found conditions un- 



der which variable-at-a-time chains are Harris recurrent. Also, much work has been done on 



establishing (HI) for various ver sions of Metropolis-Hastings. For example, iTiernevI (|1994| ) and 
Mengersen and Tweedid (119961 ) showed that th e MHIS sampler is uniformly erg odic if there ex- 
ists e > such that p{x) > e7r(x) for all x G X. iMengersen and Tweedid (j 19961 ) further proved 
that if essinf |p(x) /7r(x)) = in 7r-measur e , the resulting MHIS is not even geometrically 
ergodic. Moreover, IMengersen and Tweedid (119961) proved that t h e Met r opolis random walk 



on cannot be uniformly ergodic. However 



Christensen et al 



tall); 



Jarner and Hansen 



(j200d ) : IMengersen and Tweedie ( Il996l ): iRoberts and Tweedid (jl996l ) have established condi- 
tions under which Metropolis yields a geometrically ergodic chain. 

Note well that none of the s e convergence rate results apply to variable-at-a-time imple- 



Fort et al. 



2003 



Roberts and Rosenthall . Il998l . have shown that random 



mentations (though 

scan Metropolis random walks can be geometrically ergodic). Thus we begin by considering 
uniform ergodicity of such samplers in Section [2l eventually focusing on a variable-at-a-time 
version of the Metropolis-Hastings algorithm with state-independent candidate distributions. 
In particular, we find conditions under which the resulting chain converges uniformly to its 
stationary distribution at a geometric rate and apply them to a toy Bayesian example and a 
practically relevant example involving maximum likelihood estimation for a generalized linear 
mixed model. We also provide an empirical comparison of the variable-at-a-time sampler with 
the Metropolis random walk and the MHIS. 
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In addition to generally ensuring the rapid convergence required for useful simulation, ([T]) is 
also a key sufficient condition for calcu l ating asymptotically valid Monte Carlo stand ard errors 



(jFlegal et alJ . 12003; iFleeal and Joned . I2OO8I : 



Robert et al 



2002 



Jones et al. 



2009 1. One of 



the most common goals in MCMC is to evaluate the quantity E.,^g = J^g{x)TT{dx) where tt 
is a probability distribution with support X and E.,^\g\ < 00. In Markov chain Monte Carlo 



(MCMC) integration, Et^q is approximated by the ergodic average g„ 



n 



on a partial realization of a Markov chain having tt as its stationary distribution. The use 
of (jn is usually justified through Birkhoff 's ergodic theorem. Now ([1]) along with a moment 
condition on g ensures the existence of a central limit theorem for the Monte Carlo error, i.e., 

00, 



there exists < < 00 such that as n 



Vn{gn - E^g) 



N(0,a2) 



The condition ([ID also ensures that we can use the method of regenerative simulation to con- 
struct asymptotically valid MCMC standard errors. Regenerative simulation presents some 
complications not seen in block sampling algorithms, thus we develop a recipe for regeneration 
in variable-at-a-time implementations. Although we do not pursue these applications here, 
our algorit hm for implementin g regeneration also can be used to choose reasonable start- 
ing values (jHobert et al.l . l2006l ) or even as the basis of an a dapti ve c omponent-wise MCMC 



Gilks et al 



(|l998l l or 



Brockwell and Kadane 



sampl er, perhaps following the basic recipes of 
(I2OO5I 1. 

The remainder of this paper is organized as follows. In Section [2] we introduce variable- 
at-a-time implementations, consider convergence rates for two general algorithms, and de- 
velop conditions for the uniform convergence of the variable-at-a-time sampler with state- 
independent proposals. In Section [3] we introduce the method of regenerative simulation for 
estimating Monte Carlo standard errors, and demonstrate its implementation for variable-at- 
a-time samplers. In Section U] we consider two examples, a toy problem in Bayesian inference 
and a logit-normal generalized linear mixed model (GLMM). 



2 Convergence rates under variable-at-a-time updates 



2.1 Basic Technique: Minorization 



Th ere are constructive techniq ues for verifying the existence of an appropriate M and t in ([T]) ; 



see 



Mevn and Tweedid (jl993l . Chapter 15). Following is a method for establishing uniform 



ergodicity. Suppose there exist a positive integer no, a number e > 0, a set C G ;B, and a 
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probability measure Q on B such that 

> eQ{A) for ah xeC, A€ B. (2) 

We then say that a minorization condition holds on the set C, called a small set. If ([2|) holds 
with C = X, then <I> is uniformly ergodic and •) — tt{-)\\ < (1 — e)^"'^"'°^ where [-J is the 

greatest integer function. 



2.2 General variable-at-a-time updates 

Suppose X = Xi X • • • X X^; with Borel fi-algebra B. We allow each Xj C R'*' so that the total 
dimension is bi + ■ ■ ■ -\- bd- Under a variable-at-a-time, or component-wise, implementation 
of Metropolis-Hastings, the update of X^'^^ = x = {xi, . . . ,Xd), where Xi £ Xj, to X^^^^\ is 
obtained as follows. Letting i index the component of the chain due for update, a candidate x* 
is drawn from Pi{x, •), a proposal density on Xj. Then, if x* = (xi, . . . , x*, Xj+i, . . . , x^), 
the acceptance probability for the ith update is given by 

r *x . 7r{x*) pi{x*,Xi) 
ai{x, Xi = mm <^ 1, — — 

y T^{X) Pi{x,X*) 

Two ways to combine these component- wise updates involve composition and simple mixing. 
Before describing these we require more notation. We will continue to use a subscript to 
indicate the position of a vector component and a parenthetical superscript to indicate the 
step in a Markov chain, so X^^^ = x = {xi, . . . , Xd), where Xi £ Xi, denotes the current state 
of the chain. For each i = I, . . . ,d let = (xi, . . . , Xj) and x^ = (xj, . . . , x^); let x^j and 
\)Q null (vectors of dimension 0). Next, f{u,v\w) denotes a Markov transition density 
(MTD) if it is a density in v conditional on u and w; the reader should think of a Markov 
chain moving from u to v possibly conditionally on some other variable w. 

Composition of the component-wise updates corresponds to deterministically cycling through 
them one at a time. Thus the MTD for a move from X^^^ = x to X^^^^^ = y is 

d 

fcomp{x,y) = Ylfii^i^Vi I i ) (3) 

i=l 

where fi{xi,- \ xt^+^l) is the MTD for a Metropolis-Hastings algorithm on Xj having, 

given current state Xj, proposal density , xW), •), and acceptance probability 

Our first result establishes general conditions for uniform ergodicity of the Markov chain 
determined by fcomp- 
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Theorem 1. Suppose there exist positive constants £i and positive functions qi on Xi x • • • xXj 
such that 

fi{xi,yi\yii_i],x^''-^'^^) > eiqi{y\i]) 

for all xt*^ and yyf^, for each i = l,...d. Then the Markov chain corresponding to fcomp is 
uniformly ergodic. Moreover, if we let 

C= qi{xi)q2{xi,X2) ■ ■ ■ qd{xi, . . . ,Xd)^l{dx), 
Jx 

then after n iterations the total variation distance to stationarity is bounded above by (1 — 
Cei---edT. 

Proof. We will establish ([2]) for no = 1. Now 

d d 

fcomp{x,y) = > JJejgi(y[i]) . 

i=l i=l 

Let Pcomp denote the one-step transition kernel corresponding to the MTD fcomp- Then 

Pcomp 

(x, A) > El - ■ ■ £d C Q{A), 

where 

0( = 

C 



= / 91 (^^[l]) 92 (a: [2]) • • • qd{x[d])l^{dx) 



A 

□ 

In simple mixing, each update consists of just one randomly selected component-wise 
update, where the component selection probabilities do not depend on the state of the chain. 
Let Tj > and X]f=i "^i ~ 1' ^^^^^ then write the MTD for the transition X^^^ = x to 

d 

fmix{x,y) = ^rifi{xi,yi\x(^^i))I{y(^^i) = X(_i)) (5) 

i=l 

where X(_j) = x \ Xj and similarly for 

For the remainder of this paper we will focus on composition. This turns out to be suffi- 
cient because, as the following theorem asserts, if a component- wise algorithm with updates 
combined by composition is uniformly ergodic, then the sampler is uniformly ergodic under 
simple mixing as well. 

Theorem 2. Suppose there exists e > and a probability measure Q such that Pcompi^x , A'j ^ 
eQ{A) for all x £ X and A £ B, and thus the composition sampler is uniformly ergodic. Then 
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the corresponding simple mixing algorithm (with the same component-wise proposal kernels) is 
uniformly ergodic as well. Moreover, if ri, . . . ,r[i denote the component selection probabilities, 
then after n x d iterations the total variation distance to stationarity is bounded above by 
(1 - e n • • -rrf)". 

Proof. Let /^j^. denote the fc-step MTD of the mixing algorithm, given by ([5]) for k = 1. Now, 

d 



i=l 
d 

i=l 
d 

~ fcompi^i y) 

i=l 

where the inequahty follows from the fact that the right-hand side only accounts for the case 
that the updates occur in the order 1,2, ... ,d; the first equality follows from the definition of 
fmix'i and the second equality follows from the definition of fcomp- 

If we let Pmix denote the corresponding Markov transition kernel, we have 

d d 

i=l 1=1 

□ 

Remark 1. Consider the upper bounds on the total variation distance in both theorems, 
specifically, (1 — Cei • • • e^)" for fcomp and (1 — e ri • • • r^)" for /^j^. Notice that as d — > oo 
both of these bounds tend to 1. Also, in Theorem [2] the assumed minorization for Pcomp 
implies a total variation upper bound of (1 — e)". Of course, the upper bound for Pmix is 
larger, that is, (1 — e)" < (1 — e ri • • • r^)", but since these are just upper bounds on the total 
variation distance we caution against using them to compare the two Markov chains. 



Remark 2. [Roberts and Rosenthall (|1997| . Proposition 3.2) proved a result that has a similar 
flavor. Specifically, they showed that if the deterministically updated Gibbs sampler is uni- 
formly ergodic, then so is the so-called random scan Gibbs sampler when the component- wise 
selection probabilities are equal. Note that equal selection probabilities r, = l/d achieves the 
tightest upper bound available from Theorem [2] by maximizing the product ri • • • r^. 
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2.3 Component-wise independence sampler 



In a component- wise independence sampler (CWIS), the component- wise proposal kernels 
Pi{x, •) do not depend on the current state of the chain x. Again letting i index the component 
of the chain due for update, a proposal x* is drawn from Pi{-), a density on Xj. Then, if 
X* = {xi, . . . , Xi-i, X*, Xj+i, . . . , Xd), the acceptance probability for the ith. update is given by 

ai[x, Xj j = mm < 1, 



7r{x) Pi{x*) 

We are interested in establishing conditions under which the CWIS is uniformly ergodic. 
Clearly the conditions of Theorem [1] may be difficult to verify in practical applications. Fur- 
ther, as a CWIS update will likely be some combination of accepted and rejected component- 
wise proposals, the CWIS is not tru ly an independence sampler at all, and thus the results 
from iMengersen and Tweedid (|l996l ) are not directly applicable. It is tempting to think 



that extending their work to component-wise updates will be straightforward. If we let 
P{^) — nf=iPj(^«)i ^ density on X, is the existence of e > such that p{x) > e7r(x) a 
sufficient co ndition for uniform ergodi ci ty of C WIS? It is not at all clear that it is. Attempts 



to generalize IMengersen and Tweedid 's (j 19961 ) argument break down with the proliferation of 
cases to consider. In Theorem [3] we give a pair of conditions that together are sufficient for 
uniform ergodicity of CWIS. 

Recall the notation previously defined. We use a subscript to indicate the position of a 
vector component and a parenthetical superscript to indicate the step in a Markov chain, so 
j^(fc) — rj. — ^ ^ ^ ,Xfi), where Xi € Xj, denotes the current state of the chain. For each 
i = 1, . . . , d let xjj] = (xi, . . . , Xj) and x^ = (xj, . . . , x^); let x^j and x''^'^^' be null (vectors of 
dimension 0). Now an explicit statement of the CWIS update rule for X^^^ = x is 

1. For each i = 1, . . . , d, 
(a) Simulate x* ~pj(-) 



(b) Calculate 



i{{y[ 



ai[(y\i-^},x^'),x, = mm' 



' vr(y[i_i],Xi,x[^+il)K(x*) 



(c) Set yi = 
2. SetXe^+i) = (yi,...,yrf) 



X* with probability 
Xj with probability 1 — aj 
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Theorem 3. Consider a CWIS on state space X = Xi x • • • x with target density tt and 
proposal densities pi for i = I, . . . ,d. Define p{x) := Wi=iPi{xi) , a density on X, and suppose 
p{x) = if and only if tt{x) = 0. Further suppose there exists 5 > such that p{x) > 6'tt{x) 
for all X £ X. Finally, suppose there exists e > such that for any x,y G X with 7r{x) > 
and Tr{y) > 0, 

Tr{x)TT{y) = TT{x[i_i],x^^)TT{y[,_i],y^^) > e7r(x[i„i] , y[^V(y[i-ih ^^t^^) > (6) 

for each i = 1, . . . ,d — 1. If we let Pcwis denote the Markov transition kernel, then, for any 
X gX and AG B{X), 

Pcn,rs{x,A)>5e^''l^^7:{A) 

where [-J denotes the greatest integer function, and thus the Markov chain is uniformly ergodic. 
Moreover, the total variation distance to stationarity after n iterations is hounded by (1 — 

Proof. See Appendix |X1 □ 

Remark 3. As in Section 12.21 we see that the total variation upper bound for the CWIS 
approaches 1 as d ^ oo. Moreover, the upper bound for CWIS is larger than that of the 
MHIS which is (1 — 5)" in the notation of the theorem. However, we will encounter an 
example in Section 14.21 where, despite the larger upper bound, CWIS is convincingly better 
than the MHIS. 

The following two corollaries indicate settings under which ([6]) is easily verified. 

Corollary 1. Consider a CWIS on state space X = Xi x • • • x X^ with target density tt and 
proposal densities pi for i = I, . . . ,d. Define p{x) := Y\i=iPi{xi) , a density on X, and suppose 
p{x) = if and only if t:{x) = 0. Further suppose there exists 5 > such that p{x) > 5t:{x) 
for all X G X. Finally, suppose there exist pairs of functions gi and hi on Xj for i = 1, . . . ,d 
such that 

d d 

'[[g^{Xi)<^T{x)<'[[h,{x^) (7) 

1=1 1=1 

for any x £ X, and sup^^^^^ {f^ii^i) / 9ii^i)} < °° /^'^ ^ac/i i = l,...,d. Then the Markov 
chain is uniformly ergodic. 

Proof. We need only show that ([7D implies Let pi = ini^^^^x^ {gi{xi)/hi{xi)} for i = 
1, . . . , d; by assumption each pi > 0. Then, for any x,y gX and each i = 1, . . . ,d we have 

7r(x[^_i],x[^V(7/[,_i],j/H) ^ T=r gj{xj)gj{yj) yr 9jiVj)9j{xj)_ > A 2 
^(x[,_i],yW)^(y[,_i],xW) - f}^hj{xj)hj{yj) f}. hj{yj)hjixj) " f}^^^ ' 
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Thus Q holds with e = {pi--- pdf . 



□ 



Remark 4. An immediate consequence of Corollary [T] is that if the target density tt can be 
expressed as a product of d densities (each vTj a density on Xj), that is, if the components of 
a random vector X ~ tt are mutually independent, a CWIS with p{x) > 5t:{x) is uniformly 
ergodic. Further, since ([6]) holds for e = 1 in this case, the upper bound on total variation 
distance to stationarity is (1 — (5)", the same bound as can be obtained for the MHIS. 

Corollary 2. Consider a CWIS on state space X = Xi x • • • x with target density tt and 
proposal densities pi for i = I, . . . ,d. Define p{x) := Y[i=iPii^i)' ^ density on X, and suppose 
p{x) = if and only if tt{x) = 0. // there exist < a < b < oo and c < oo such that 
a < Tr{x) < b and p{x) > c for n-almost all x, then the chain is uniformly ergodic. 

Proof. The conditions of Theorem [3] hold with 5 = c/b and e = (a/b)'^. □ 

Remark 5. Although, in the interest of brevity, we have not stated it here, using Theorem[2]it is 
straightforward to see that the corresponding simple mixing algorithm with state-independent 
proposals is uniformly ergodic under the conditions of Theorem [3] and Corollaries [U and El 



2.4 Metropolised Gibbs 

In a Gibbs sampler, the update of X^''^ = x = {xi, . . . ,Xd), where each Xj € Xj, to X^^'^^'^ = 
y = (yi, . . . , yd) is conducted by, for each z = 1, . . . , d, taking yi to be a draw from the condi- 
tional density vrj(-|y[j_i],xt*^^]). If the means to simulate from some or all of the conditional 
densities 7ri(xi|x(_j)) are unavailable, one might apply the Metropolis-Hastings idea to the 
component- wise updates. Letting x^), •) denote the proposal density for the iih. 

component, this "Metropolised" Gibbs algorithm proceeds by, for each i = 1, . . . , d, 

1. Simulate x* - Pi((?/[i-i], a;^), •) 

2. Calculate 



3. Set 



ai = mm < 1, — ; — ; rrrrr r, rr. ^ } (8) 

1 '7ri(xi|y[i_i],x[*+il) Pi{{yii^ii,xi^]),x*) ^ 



X* with probability ai 
Xi with probability 1 — ai 



Of course, this approach is identical to the composition approach to variable-at-a-time updates 
of Section I 
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Theorem 4. The Metropolised Gibbs sampler with Metropolis-Hastings proposal densities 
Pi((y[j_i] , xW), •), i = 1, . . . ,d, is equivalent to fcomp with proposal densities pi((y[j_i],xW), •). 

Proof. Proposal densities are identical, so we need only show that acceptance probabilities 
are equivalent (which they must be, as both algorithms are constructed to have the same 
invariant distribution). If we let x* = (xi, . . . , Xi-i, x* , Xi^i, . . . , Xd), we have 

tt{x*) _ 7r_j(x(_^))7r^(x*|x(_j)) _ 7ri(x*|x(_j)) 
tt{x) vr_i(x(„i))7ri(xi|x(_j)) 7rj(xi|x(_j)) 

and the equivalence of ^ and ([8|) follows immediately. □ 



3 Regenerative simulation 



Here we consider a generalization to the minorization condition introduced in Section 12. H and 
assume it holds for uq = 1. Specifically, we suppose there exists a function s : X — > [0, 1] with 
EtjS > and a probability measure Q on B such that 



P{x, A) > s{x)Q{A) for all x £ X and A £ B . 



(9) 



The method of regenerative simulation (RS) requires simulation of the split chain {(X^"^, 6^"'^)}, 
generated as follows. Given X^''^ = x, find X^''^^^ and d^'^^ by 



1. Simulate X^^+i) ~ P{ 



X, 



2. Simulate J^'^^ a Bernoulli random variable with success probability r(X('=), 
where 

s{x)Q{dy) 



r{x,y) 



(10) 



P{x,dy) 

denotes the conditional probability of regeneration given a jump from x to y in the 
chain. 

By construction, the sub-chain jX^")} has Markov transition kernel given by P. Also, the set 
of n for which §^"~^^ = 1, called regeneration times, represent times at which the chain "prob- 
abilistically restarts itself." An important application of regenerative simulation is Mont e 
Carlo standard err o r estimation, as de s cribe d in Appendix iBl see an y of lHobert et al.l (|2002l ): 



Jones et al 



(120061 ): iJones and HobertI (|200lh : iMvkland et al.l (|l995l ) for further details. 

Now consider a variable-at-a-time Metropolis-Hastings algorithm having transition density 
defined at ([3]), i.e., 

d 



fcomp{x,y) = Wfi{xi,yi I y[i-i],x[*+^]) 



i=l 
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where each fi{xi, ■ \ x'*"^"^]) is an MTD with proposal density x^), • ). Assume 

that for i = 1, . . . ,d there exist positive functions Sj on Xj x • • • x and on Xi x • • • x Xj 
such that 

Pi{{y[^-l],X^^),yi) > S^{x^'^)q^{y^i]) . (11) 

Next, assume that there are positive functions Wi for i = 1, . . . ,d such that 

■Wi{x)pi{x,yi) = Wi{y)pi{y,Xi) for all x,y G X . (12) 

Remark 6. For the CWIS note that ([TT]) holds with Sj(xW) = 1 and qi{y[i\) = Pi{yi) and (fT2]) 
holds with Wi{x) = Pi{xi). 

Set ri{x) = ^{x) /wi{x) and also suppose there exists a set of functions gii,gi2, hn, hi2 for 
i = 1, . . . ,d such that for any x with tt{x) > we have 

< 5'ii(a:[i])5'i2(2;'*+^l) < ri{x) < hii{x[i_i])hi2ix^'^) for each i = l,...,d (13) 

and assume further that 

Engiigi2 > for each i = 1, . . . ,d . (14) 

Remark 7. Consider the conditions (jl3p and (jl4p . which are not as restrictive as they may 
at first appear. Indeed these conditions hold quite generally for bounded target densities. 
For example, consider the CWIS setting and take Wi{x) = Pi{xi) as above. For each i = 
1, . . . , d — 1, select a point G Xi x • • • x Xj, and a set Di C Xj+i x • • • x X^ such that 
inf {7r(2;) : x^'+^l G A} > 0. Then take 

vr(x[,],a;[^+i]) " 
7r(x[j],x[*+i])J 

and 

ff.2(x[^+i])=7r(x[,],x[^+i])/D,(x[''+^l). 
Assume sup7r(x) < oo. We can take 

/iii(x[i_i]) = sup 7r(x[j_i],a;W) 

and /ii2(xW) = l/pi(xi), and (fT3]l holds. Finally, (fUj) holds by definition of the Di. 

We now show that (|lip -p4 p together are sufficient to guarantee a minorization. First, 
note that 

d 

fcomp{x,y) > J|pi((y[i_i],xW),yi)Qi((y[i_i],a;W),yi) 
1=1 



9iiix[{\) 



1 



inf 
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as the right-hand side only accounts for the case where every component- wise update proposal 
is accepted. Also, 

, . j, vr(y[,],x[-+i])p,((y[,],:r[-+il),x,) ] 
= min<^ 1, — uTT^T uK T } 



mm 



mm 



> min 



> min 



\ 9^l{yl^])9^2ix^'^'^) \ 

' hii{y[,_i])hi2{xi'^) J 

' ^ii(y[i-i])J 1 ' Cihi2{xi'^) 



where the equality follows from p2p , the first inequality follows from (jl3p and the second is due 
to the fact that for any nonnegative real numbers a and b, min {1, ab} > min {1, a} min {1, b}, 
and Cj is any positive constant. Now (|lip implies 

Wx,y) > n.^.(-t^Vin|l, ^1^} '?.(%l)min{l, 
This implies a minorization condition ([9]) with 



and probability density 



.(x)ocTT^.(^'^^)min 1, ^-f" , (15) 



.(.)ocn'Z.(y[,)mm{l,^J^} . (16) 



Note that some care must be taken in the selection of the Si and the gii,gi2 to ensure that 
Ej^s > 0, as required for the minorization condition. We will not need the normalizing constant 
for q since the probability of regeneration only depends on the product of s and q. We now 
give the update rule for the component-wise split chain. Letting X^'^^ = x denote the current 
state of the chain, X^^^^^ and 6^^^ are found by 

1. Generate X^'^'^^^ = y according to the component- wise update rule or fcomp{x, •) 
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2. If even a single component- wise update proposal was rejected, set 6^^^ = 0. If every 
component-wise update proposal was accepted, let 5^''^ be a Bernoulli random variable 
with success probability 



vr (y [j_ 1] , a; W ) Pi ( (y [j_ 1] , X W ) , ) 



Pj((y[j_i],xl*l),yi)mm < 1, — u — — — 



We thus have the following theorem, which extends iMvkland et al.r s (|l995l ) Theorem 2 to 
component-wise updates, guaranteeing that the above update rule yields the correct regener- 
ation probability (fTO|) . 

Theorem 5. A component-wise sampler with MTD fcomp having target density vr satisfying 
pip - ()14p has a minorization condition P{x^A) > s{x)Q[A) for all x £ X and A E S{X). 
Further, in the component-wise split chain algorithm defined above, the probability of a regen- 
eration occurring on a jump from x to y is given by 

s{x)Q{dy) 



Pr((5W = = = y) 



P(x, dy) 



Proof. The first assertion was proved above. Let A^ denote the event that every single 
component-wise update proposal is accepted in the k -\- 1 update. Since is zero if even a 
single update proposal is rejected, we have for any x,y £ X, 

and thus 

Pr(5(^') = = X, =y)= Pr(^fc, 5^=) = IjxC^) = x, = y) 

= Pr(5(^') = = x,X^''+^^ = y) 

X Pr(^fc|X('=) =x,X(^'+i) =y) 
_ , ^ niLiPi((y[i-i],3;W),yi)ai((y[i_i],xW),yi)/i((iy) 

- '^^"'"^ ' n^) 

^ s{x)q{y)fi{dy) ^ s{x)Q{dy) 
P{x,dy) P{x,dy) 

where is given by p!7|) . and q and s are given by (fT6|) and (fT5]) . respectively. □ 

Remark 8. Implementation of RS in the algorithm defined by MTD fmix is challenging. In 
particular, we were unsuccessful in establishing a minorization of the form Q. However, it is 
at least theoretically possible to implement RS using the minorization developed in the proof 
of Theorem [2j Of course, this is not appealing from a practical point of view. 
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4 Examples 



4.1 A toy problem in Bayesian inference 

Let Yi,...,Ym be i.i.d. Normal(/i, 0), and let the prior on {fj.,9) be given by the density 
7ro{iJ.,6) oc lA{^i)lB{&)/^-, where A and B are Borel subsets of M and M"*", respectively. 
Consider the problem of simulating an ergodic Markov chain with stationary distribution 
given by the posterior density 



where y = ^ Yl^i Vi •^^ — YllXi{yi~yY ■ IJones and Hoberti (|200ll ) showed that, provided 



m > 5, the Gibbs sampler is geometrically ergodic, even with ^ = M and B = Here we 
propose a Metropolis-Hastings independence sampler that we show to be uniformly ergodic 
provided A is bounded, and a component-wise independence sampler that we will show is 
uniformly ergodic provided A is bounded and B is bounded below away from zero. Then we 
show how to use RS in both samplers and conclude with an empirical comparison. 



4.1.1 An independence sampler 



If (fi^^^O^^^)) = (^,9) £ AxB denotes the current state of the chain, the update (^('^+^\ ^('^+^)) 
is found by 



1. Simulate proposals fi* ~ N(y, ^) on A (a truncated normal distribution) and 0* ~ 
IG(^^^, ^) on B (a truncated inverse gamma). Simulating the proposals is easily done 
by rejection sampling. 

2. Calculate the acceptance probability a = min|l, '^l^^^'g^^ J'^^'gl^ |i where p{ji,9) is the 
product of the two univariate proposal densities. The acceptance probability reduces to 
the minimum of 1 and 



exp 



m 



(^^-y?-[z^-l^^(l^*-yf 



3. Set 



{fi*,9*) with probability a 
(n, 9) with probability 1 — a 



To prove that this algorithm is uniformly ergodic we will show the existence of e > such 
that 9) > en^fi, 9) for all {fj.,9) £ A x B . Suppose A is bounded, and let /i satisfy 
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1/2 - y| = sup^g^ l/x - y| < oo. Now 

Thus the inequaUty holds for e = /cexp { — ^(/2 ~ y)^} > where /c is a ratio of normalizing 
constants. 



To simulate the MHIS split chain we follow the recipe of lMvkland et alj (|l995l ). The con- 
ditional probability of regeneration on a jump from (fi' , 6') to {fi, 8), given that the Metropolis- 
Hastings proposal was accepted, is 



rAix,y) 



{^'1^} <f^',o')>cp{t,',e'), ^{^^,0)>cp{f,,e) 

1 otherwise . 



mm • 



We take the constant c > to be the median value of 7r(//, 9)/p{^, 9) from a preliminary run. 
4.1.2 A component- wise independence sampler 

In a CWIS the proposals /i* and 9* are considered independently, allowing the possibility that 
one component of the chain might move while the other does not. Let {fi^^\9^^'^) = (/i, ^) G 
Ax B denote the current state of the chain. The updated value found by 



1. Simulate n* from candidate density pi{-) ~ N(t/, ^) on A (truncated normal distribu- 
tion). 



2. Calculate the acceptance probability ai = min|l, ^^[^ viil^*) } ' ^'^^'^^ reduces to the 
minimum of 1 and 



3. Set 

/i* with probability ai 
f.1 with probability 1 — ai 



If we now let (/i('=+i),6iW) = (1^,9) e Ax B, we find 9^^=+^^ by 

1. Simulate 9* from candidate denstiy p2(") ~ IG(^^^, ^) on B (truncated inverse gamma 
distribution). 
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2. Calculate acceptance probability 02 = min |l, '^^^^g^ P2(d*) } ' '^'^i*^^ reduces to the min- 
imum of 1 and 



3. Set 



9* with probability 02 
9 with probability 1 — 02 



To show the above CWIS is uniformly ergodic, we must find 5 > such that pi{fJ-)p2{0) > 
5TT{fi,9), and e > such that ([6]) holds. Assuming A is bounded, let jl satisfy \fl — y\ = 
sup^g^ \fi — y\ < 00, and the former condition holds with 5 = /cexp { — — y)^} > by 
the argument of the previous subsection. Further, 

If we also assume B is bounded below away from zero, and denote the infimum of -B by 0* > 0, 
then ([6]) holds with e = exp | — "^^20^^ | ■ Uniform ergodicity follows from Theorem [3j 

Now consider implementing RS. Let ri(/x,^) oc Tr{fi,9)/pi{fi) and r2{n,9) oc TT{fj,,9)/p2{9). 
For the CWIS split chain we seek pairs of functions 51,52 and /ii,/i2 such that ri{fi,9) > 
gi{li)g2{9) and r2(/i,^) < hi{ii)h2{9). Now 

/ n\ f m + 1 m/1 1\ _2 

n(/.,e) = exp I ^log^ - ^ - T (^^ - - 

mfl l \ , .2^ f m + 1 , ^ 



for any choice of 9 G B, suggesting a sensible choice of gi and (72- Also 

r2(//,6l) =exp|-^(^-y)2| 

and thus the desired inequality holds for hi{p) = 1 and h2{9) = 1. 

The conditional probability of a regeneration on a jump from {/j,' , 9') to (/x, conditional 
on both component- wise proposals being accepted, is given by 

92(0') \ . , u ■ /1 1 \ • /1 

, I mm {1, 51 (^)} mm 1 1, ^ I mm 1 1, ^ 



mm < 1, — , . > mm < 1. 



rA{{^^',9'),{^l,9)) 

^'ri(^',0')/ """r'r2(/i,0O 
with the components of this expression as defined above 
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4.1.3 An empirical comparison 

Let m = 10, y = 10.2, and = 6.5, and suppose our goal is to estimate the posterior mean 
inverse coefficient of variation E{^/^f9\y). 

We let A = (0, 100) and B = and implemented the MHIS of subsection 14.1.11 starting 
the chain at (/uW,6lW) = (10,1). Results are shown in Figure [TJ Autocorrelation tails off to 
zero by lag 10 or so. 

We let A = (0, 100) and B = (.01, oo) and considered the CWIS of subsection 14.1.21 again 
starting the chain at (/i(°), 6'(°)) = (10,1). As Figure [2] demonstrates, the autocorrelation 
function for CWIS tails off more rapidly than did that of the MHIS. 

To study the performance of the regenerative simulation approach to MCMC standard 
error estimation (see Appendix [B]) we conducted the following simulation study. For each 
sampler we produced 20,000 chains of a fixed length n. From each realized chain we simu- 
lated the associated split chain using the techniques described above, and computed a 95% 
confidence interval for the quantity of interest based on (|24 P " (|26|) with g{ix,0) = n/^fO. The 
"partial tours" at the beginning and end of each run, those draws that preceded the ffi'st re- 
generation and those that came subsequent to the last observed regeneration, were discarded. 
We note that discarding the partial tour at the end of the run introduces a bias, but it will 
be negligible in this simple example. We can compare MHIS to CWIS with respect to inter- 
val widths and realized coverage rates. We considered n = 5000 and n = 1000; results are 
summarized in Table [TJ 

The last two columns of Table [1] describe the frequency of regeneration for each sampler. 
The mean tour length for MHIS was 2.57, and that of CWIS was 5.22. RecaU that CWIS 
permits a regeneration only if both component- wise proposals were accepted, thus it is not 
surprising that CWIS regenerates less frequently than MHIS. 

While high frequency of regeneration is desirable, this is not the correct criterion on 
which to be comparing samplers. The middle columns in Table [1] report coverage rates and 
interval half- widths. While CWIS produces less frequent regeneration than MHIS, it also 
yields narrower confidence intervals, with no discernible sacrifice in coverage probability. 

4.2 A logit-normal GLMM 

In a generalized linear mixed model (GLMM) the distribution of the observable data is spec- 
ified conditionally on an unobserved vector of random effects. Consider a GLMM in which, 
conditional on [/ = n, the observations Yij are independently distributed as Bernoulli(pij), 
where logit(pij) = (3xij + for j = 1, . . . , mj and i = 1, . . . ,q, where the Xij are covariates. 
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Let the random effects Ui, . . . ,Ug be i.i.d. Normal(0, cj^). iMcCullochI (Il997l ) discusses tliree 
Monte Carlo algorithms for finding maximum likelihood estimators of the unknown parame- 
ter = cj2). A common feature is that all three require simulation from the same target 
distribution, namely the conditional distribution of the random effects given the data. We 
thus wish to simulate an ergodic Markov chain with stationary density 



7r(M|y; 9) oc exp < 



E 



l + e 



2a2 



(18) 



,q- 



McCulloch 



where yi+ = YJj=\ Vij ^ = 1; 
a Monte Carlo approximation to the so-called Q-function 



s (|l997l ) Monte Carlo EM algorithm requires 



Q{e;i 



lciO;y,u)Tr{u\y;9)du 



where 



■,y,u) 



q rrii 



l + e 



=1 j=i 



1 

2^ 



E 



denotes the "complete-data log-likelihood," what the log-likelihood would be if the random 
effects were observable. 

In the following we propose three samplers having target density vr(u|y; 9). We begin with 
a Metropolis-Hastings random walk sampler which we show to be geometrically ergodic; next 
we consider an independence sampler which we show is uniformly ergodic; and finally we 
consider a component-wise independence sampler which is uniformly ergodic. For the latter 
we discuss the simulation of the split chain. Further, we include an empirical comparison of 
the three samplers, and report the performance of regenerative simulation in the CWIS. We 
conclude with a comment on the use of RS in the Metropolis random walk. 



4.2.1 Metropolis random walk 



A Metropolis random walk with normally distributed jump proposals, that is, 

p{u, u*) oc exp I — — "Up 



(19) 



where | • | denotes th e standard Euclidean norm , is geometrically ergodic, as we prove here 
using Theorem 4.3 of iJarner and HansenI ([20001). We must show that 

M-^V log 7r(n) 



lim 

|«|— >oo 



-OO 



\u\ 
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and that 



U"^Vlog7r(u) 
iim sup -— 

\u\-*oo m |Vlog7r(n) 



< 



^ ^l"") = ~ Pi+ ~ P"' where pi+ = Y1T=1 P^^i' for i = 1, . . . , g, and thus 



Next, 



|m|^oo 



4 lim 



-— :r hm \u\ 



-oo . 



hm sup - — -j- = hm sup 



|«Hoo |Vlog^(n)| .1/2/ q , 

1 V^'J ,,2 



1/2 



hm sup ■ 



hm sup 1 — ^ = — 1 



4.2.2 An independence sampler 

Consider a Metropohs-Hastings independence sampler with proposals drawn from the random 
effects' marginal distribution, p{u). The acceptance probability for a proposed jump from u 
to u* reduces to the minimum of 1 and r[u*) /r{u), where r(n) oc 7t{u)/p{u), the ratio of the 
target density to the proposal density. In the logit-normal example we can take 



r(n) = exp ^ 



i=l 



UiVi. 



^ log (l + e'^^'^ 



(20) 



Recall that an MHIS is uniformly ergodic if and only if there exists e > such that p{u) > 
e'K{u) for all n; equivalently, an MHIS is uniformly ergodic if and only if r[u) is bounded. It 
is easily verified that r{u) < exp |— /? ^ ■ 2;jjyjj| for all u £ W, and thus our MHIS is 
uniformly ergodic. 



4.2.3 A component-wise independence sampler 



In 



McCulloch 



s (|l997l ) implementation of the Monte Carlo EM and Monte Carlo Newton- 
Raphson algorithms, simulation from the target distribution is accomplished by a component- 
wise independence sampler with (univariate) proposals drawn from the marginal distribution 
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of the random effects. Specifically, for the ith component-wise update, we generate a proposal 
u* ~ Normal(0, cr^); the acceptance probability is again equal to the minimum of 1 and 
r{u*)/r{u), where u* = (ui, . . . , «*, Uj+i, . . . ,Ug) and r is given by ([20]) . The CWIS is 
also uniformly ergodic, an immediate consequence of our Corollary [1] and the fact that the 
random effects are conditionally independent given the data. 

Regenerative simulation is straightforward in this problem because psp holds with equality 
on both sides, again as a result of component-wise independence in the target distribution. De- 



ri Ui,; 



fine ri{ui) = exp |uiyi+ - log(l + e'^''*^+"»)| for i = 1, . . . , g, so that r{u) = 
The conditional probability of regeneration on a jump from u' to n, conditional on every 
component-wise proposal being accepted, is 



rA{u',u) = Y[ 



mm 



1 > min 



mm 



or 



rA{u',u) = Y[ 



mm 



mm 



Ci ' Ci 

ri(u[) ' ri{ui) 
1 



rj(n-) < Ci,ri{ui) < Ci 
ri{u'i) > Ci,ri{ui) > Ci 



otherwise . 

With the goal of maximizing the frequency of regeneration, it makes sense to take the Cj to 
be the median values of rj(uj) from a preliminary run. 



4.2.4 Simulation study 

Suppose Xij = j/15 fo r each j = 1, . . . , m, = 1 5, for each i = 1, . . . ,q = 10. We consider a 
data set generated bv iBooth and HobertI (|l999l . Table 2), assuming 6 = cr^) = (5.0,0.5). 



Suppose we want to evaluate Q{6;6) for 6 = 6 = (4.0,1.5). We can take as a Markov 
chain Monte Carlo approximation the ergodic average of the chain |Zc(^; 'U^*'^)} where 
: /c = 1,2, . . .} is an ergodic Markov chain with stationary density Tr{u\y;9) as defined 
by m- 

We ran each of the three Metropolis-Hastings algorithms discussed above, in each case 
simulating a chain of length n = 10^ and taking as our initial distribution [/(o) ~ Nq(0,cj2/). 



For the Metropolis random walk we drew our jump proposals from a Nq(0, r'^/), with = 
fj^/lO (this setting determined by trial and error, in order to minimize the autocorrelation 
in the resulting chain, and yielding an acceptance rate of 39.18%). A partial trace plot 
(the second 1000 updates) and the sample autocorrelation function are shown in Figure [3l 
Analogous plots for the MHIS and CWIS follow, in Figures 3] and O respectively. 
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Our most striking result is the dreadful performance of the MHIS. It is worth recalling 
that, as we proved above, Figure H] depicts a uniformly ergodic Markov chain! Thus this 
example nicely illustrates the perils of over-reliance on asymptotic properties of a sampler, 
which provide little guarantee of favorable performance in finite-sample implementations. The 
Metropolis random walk (geometrically ergodic) mixes much faster than the MHIS, but still 
shows significant autocorrelation. The CWIS is, by a very wide margin, the best of the three 
samplers considered. 

To study the performance of regenerative simulation in this setting we propose the fol- 
lowing experiment. Using the CWIS algorithm defined above, run a chain for a fixed number 
of regenerations R, and calculate a 95% confidence interval for Q{9;9) based on the RS ap- 
proach to MCMC standard error estimation (see Appendix [Bl specifically, we use (p^ - ([26]) 
with g{u) = lc{9',y,u)). Repeat a large number of times, noting coverage rates and interval 
widths. 

We considered R = 50, R = 25, and R = 10, and generated 1000 chains for each. Results 
are summarized in Table [2l The right-most columns summarize the observed chain lengths 
(note that this experiment has the opposite design of the previous example, in that the 
number of regenerations is fixed and the total chain length is random). The overall average 
tour length is iV = 10,732, a result of the fact that a regeneration in CWIS requires that 
every component-wise proposal be accepted. Because the tours are so long, one might expect 
regenerative simulation to yield accurate results for even modest values of R. For R = 50 and 
R = 25 the attained coverage rate is in fact reasonably close to the nominal confidence level. 
For R = 10 the method undercovers. 



4.2.5 Regeneration in Metropolis random walk 



The reader might wonder why we did not conduct a similar study for the Metropolis random 
walk sampler. As we will show, it is not entirely clear how to impl ement regenerat iv e sirn - 
ulation in high-dimensional Metropolis random walks; in particular, iMvkland et al.r s (|l995l ) 
approach is often not viable, as we demonstrate here. 



The RS recipe of iMvkland et al.l (jl995l ) requires functions s and q such that p{u,u*) > 
s{u)q{u*) for all u,u*. The conditional probability of regeneration on a jump from U^^'^ = u' 
to [7^'^+^) = n, given that the Metropolis jump proposal was accepted, is given by 



rA{u ,u) 



s{u')q{u) 
p{u', u) 



X < 



mm 



7r(u') n{u) 



mm s I i\ , — 

1 



tt{u') < c, 7r{u) < C 
7r(n') > c, 7r{u) > c 
otherwise . 
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To find appropriate functions s and q iMykland et alJ (jl995l ) suggest choosing a point u and a 



set D and taking s{u) — inf^j* ^d[p{u,u*)/p{u,u*)] and q{u*) = p{u,u*)Id{u*). Intuitively, it 
makes sense to take u to be some a priori estimate of the "center" of the target distribution, 
perhaps the ergodic average from a prehminary run. We might then take to be a hypercube 
centered at n, that is, D = {u : \ui — Ui\ <hi z = 1, . . . , d} for some bi, . . . ,bd > 0. For the 
Metropohs random walk p{u,u*) is given by ()19p . Then 



s{u')c 



p{u', u) 



f 1 " 1 ' 

exp <^ 2 ~ Ui)[ui -Ui + bi sgn(n- - Uj)] > I{\ui - Ui\ < bi) (21) 

{ i=i ) i=i 



where sgn(z) = z/\z\ for z 7^ 0. 

We ran a chain of length n = 10^ and observed 391,804 accepted proposals. We take the 
Ui to be the mean values from a preliminary run. The choice of the bi entails a Goldilocks 
problem: we want to observe \ui—Ui\ < bi with high frequency, but ([2T|) depends on an infimum 
taken over the set of all such u. We considered six different values of the bi (six different 
multiples of the standard deviations from a preliminary run); results are summarized below. 
The middle column indicates the proportion of the 391,804 accepted proposals for which each 
\ui — Ui\ < bi, i.e., the proportion of the time that (j2ip was non-zero. The right-most column 
indicates the average non-zero value of ([2T]) . 



bi 


%{ tij - Ui\ < bi} 


m 




0.3 SBi 


0.00 






0.5 SDi 


5.10 X 10-^ 


0.00107 




1.0 SDi 


0.0193 


2.62 X 10" 


06 


1.5 SDi 


0.2269 


6.77 X 10- 


09 


2.0 SBi 


0.6113 


7.88 X 10- 


11 


2.5 SDi 


0.8668 


1.79 X 10- 


12 



There is no acceptable trade-off available here, and thus regeneration is not viable in this 
problem. We note two caveats on our negative conclusion: (i) that we only considered D to 
be a hypercube centered at u, and (ii) that we considered only the approach recommended 



bv lMvkland et al.l (|l995l ). We are not aware of any other literature on this problem. It is not 
our contention that regeneration is impossible in high-dimensional Metropolis random walks. 
We merely hope to convey that it is a challenging proposition, and suggest it as a possible 
subject for future research. 
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A Proof of Theorem [3] 



First note that e < 1 as can be seen by taking x = y. Now, for x,y G X, define 

d 

P{x,y) = Y[pi{yi)ai{{y[i-i],x^'^),yi)- 

i=l 

We will show that there exists p > such that f3{x,y) > pir^y) for all x, y S X and thus, since 
P{x, dy) > (3{x, y)dy, that the chain is uniformly ergodic. 

For any x and y we can partition the index set {1, . . . ,d} into Ji and Iq defined by 



h 
h 



o{x,y) = |i : ai((y[,_i],xW),yi) < l| ; 



and write 



(23) 



P{x,y) = llMyM{y[,^^,x^^+\y.^ = J] P^iy^) U P^i^^) ""iril! • (22) 

Now fix X and y in X. There exists an even integer k such that 

/o(x,y) = {fio + 1, . . . ,(ii,(i2 + 1, • • • ,^3,(^4 + 1, • • • ,^5, ,4 + 1, • • -,4+1} 

h{x,y) = {di + 1, . . . ,4,4 + 1, • • • ,4,4 + 1, • • • ,4, • • • ,4-1 + 1, ... ,4} 

where = 4 < < 4 < 4 < . . . < 4 < 4+i = d- 

This representation of Iq and Ii appears to make sense only for the case where ai and ad 
are both less than 1, but in fact can be shown to accommodate the general case by allowing 
the first or last batch of Iq to be null. We will explain further here, but first we must explain 
how k and the dj are determined. 

For each j = 1, ... ,k — 1, dj + 1 is the index of a switch between a = 1 and a < 1: If j is 
even the switch is from a^^ = 1 to a^^+i < 1 and if j is odd then a^^ < 1 and a^^^i = 1. If 
ai < 1 and < 1, then k is the total number of such switches, which must be even. If only 
one of ai and ad is less than 1, there must he k — 1 such switches, and if ai = = 1 then 
the number of switches is A; — 2. We can define di and 4 a bit more precisely as 



di 

and 

dk 



if all Ui < 1 

min{z : aj = 1} — 1 otherwise 

4 = if all Oj < 1 

max{i : ai = 1} otherwise 
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Thus the representation in ()23p is completely general, since either or both of di = (in which 
case the set {do + 1, . . . , di} is null) and dk = d (in which case {d^ + 1, . . . , dk+i} is null) are 
allowed. A null contribution to Iq is easily recognized: If di = 0, as is the case where ai = 1, 
the first batch of indices in Iq is {dQ + 1, . . . , di} = {1,0}, which should be considered null; if 
dk = d, as is so when = 1, the last batch of indices in Iq is {d^ + 1, . . . , d^} = {d + l,d}, 
which should also be considered null. 

In the special case where all Oj are less than 1, then k = and di = d; if all ai are equal 
to 1, then k = 2, di = and d2 = d. 

We now find p > such that /3(x, y) > p7r{y) for any x,y £ X. First, suppose x and y are 
such that all are less than 1. Then /i = 0, and 



thus taking care of that special case. Now suppose that at least one Oj = 1. We will 
need a new notation which we introduce here. Given = do < di < d2 < ■ ■ ■ < dk < 
dk+i = d, we partition the vector z € X = Xi x • • • x as z = {zq, zi,Z2, ■ ■ ■ , Zk) where 



P{x,y) =p{x) 



> 5TT{y) 



■n{x) 



= {zdj+i-, ■ ■ ■ , Zdj+i)- Thus we are changing our use of a subscript from indicator of a single 
component to indicator of "batch." Define zy] = (zq, zi, . . . , zj) and zt-'l = {zj, Zj+i, . . . , Zk) 
for J = 0, 1, . . . , fc. Let and zl^^+^l be nuh. By 




> 6e''/^TT{y[k-i],Xk) 
= 5e''/\{y) 



'^{y[k-i],xk) 
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where the first inequahty fohows from the assumption that p{x) > 6tt{x) and the rest follow 
from ([6|). We have seen that the number of switches is either k, k — 1, or k — 2; thus it must 
be that k < d + 1. But if the number of switches is A; — 2, that means the first batch of 
Iq is null and we have one less e on the right hand side of the final inequality above. Thus 
P{x,y) > pT^{y) holds with p = 5eL'i/2j^ g^^id thus the chain is uniformly ergodic. 



B MCMC standard error estimation via regenerative simula- 
tion 

Let ^X^^\ X^^\ X^'^\ . . be a Harris ergodic Markov chain on (X,i3) with stationary dis- 
tribution vr and let 5 : X ^ Further let <5(")) : n = 0, 1, 2, . . .} be an associated 
split chain with (5^°) = 1. Define = tq < n < r2 < . . . by = rnin^k > r^._i : = 1} 
for r = 1,2,...; thus indicates the rth regeneration time. We call the path taken by 
the chain between regeneration times r,._i and the rth tour; let = — Tr-i and 
Sr = Yl'k=Tr-i+i di-^^'^^) denote the tour length and tour sum, respectively, for r = 1, 2, . . .. It 
follows from the split chain construction that the pairs {Nr,Sr) are i.i.d., since each is based 
on a different tour. 

Suppose we run the split chain for a total of R regenerations (and hence observe R tours). 
A natural estimator of Ej^g is 

Assume E{Nr) < 00 and E{\Sr\) < 00. Then, as -R ^ 00, gn^Rs = Qtr E^^g by the strong 
law of large numbers. Further, \iE{N'^) < 00 and E(S^ ) < 00 , then ^/R{gT-j^—E.„g) N(0,^p 



as i? — > 00 by the central limit theorem. iHobert et al.l (|2002l ) proved that 



1 ^ 

■■= ^2Y.^Sr - Nr-gr,f (25) 

r=l 

is a consistent estimator of Finally, an approximate 100(1 — q)% confidence interval for 
E^^g is given by 

-gr,±z"''^ (26) 



where Pr(Z < z^) = \ — p \i Z has the standard normal distribution. 
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Trace plot for MHIS 
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Figure 1: Trace plot and estimated autocorrelation function of for the MHIS 

the example of Section \4-l\ 
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Trace plot for CWIS 
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Figure 2: Trace plot and estimated autocorrelation function of for the CWIS 

the example of Section \4-l\ 
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Half-width 


Coverage 


Number 


of tours 


Sampler 


Chain length 


Mean 


Std Dev 


Rate 


Std Error 


Mean 


Std Dev 


MHIS 


n = 5000 


0.1494 


0.0093 


0.9495 


0.0015 


1944.47 


34.65 


CWIS 




0.1113 


0.0069 


0.9494 


0.0016 


958.01 


27.31 


MHIS 


n = 1000 


0.3321 


0.0293 


0.9455 


0.0016 


388.03 


15.54 


CWIS 




0.2472 


0.0278 


0.9441 


0.0016 


190.88 


12.20 



Table 1: Half-widths and coverage rates of nominal 95% confidence intervals for E{n/ \/d\y) 
in the example of Section [4.11 Based on 20,000 replications. 





c. 


overage 


Half-width 


Total chain length 


Regenerations 


Rate 


Std Error 


Mean Std Dev 


Mean Std Dev 


i? = 50 


0.928 


0.0082 


0.01172 0.00189 


533,668 72,893 


i? = 25 


0.920 


0.0086 


0.01600 0.00355 


267,392 52,479 


= 10 


0.862 


0.0109 


0.02377 0.00836 


107,316 35,258 



Table 2: Coverage rates and interval half-widths of nominal 95% confidence intervals for 
Q{9;9) in the logit-normal example of Section [4. 2[ Based on 1000 replications. 
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Trace plot for Metropolis random walk 
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Figure 3: Partial trace plot and autocorrelation function for the Markov chain {lc{9;y,u^^^)} 
generated by the Metropolis random walk sampler in the logit-normal example of Section\4-S\ 
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Trace plot for MHIS 
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Figure 4: Partial trace plot and autocorrelation function for the Markov chain {lc{9;y,u^^^)} 
generated by the Metropolis-Hastings independence sampler in the logit-normal example of 
Section \4.S\ 
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Trace plot for CWIS 
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Figure 5: Partial trace plot and autocorrelation function for the Markov chain {lc{9;y,u^^^)} 
generated by the component-wise independence sampler in the logit-normal example of Section 
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